Speech coding using mixture of gaussians polynomial model
نویسندگان
چکیده
We have investigated a novel method of spectral estimation based on mixture of Gaussians in a sinusoidal analysis and synthesis framework. After quantisation of this parametric scheme a xed frame-rate coder operating at a bit-rate of around 2.4 kbits/s has been developed. This paper describes an extension to this spectral model based on constraining the parameters of the mixture of Gaussians to be on a polynomial trajectory over a segment of speech data. This is referred to as the mixture of Gaussians polynomial model (MGPM). In order to realise a segmental coder, dynamic programming over the utterance is performed. The segmental representation of the spectra results in a log-likelihood score over a segment which is used as the cost function in the dynamic programming algorithm. Speech coding components such as pitch, voicing and gain are described segmentally. A number of segmental coders are presented with bit-rates in the range of 350 to 650 bits/s. These coders offer good and intelligible coded speech evaluated using DRT scoring at these bit-rates.
منابع مشابه
A hybrid speech recognizer combining HMMs and polynomial classification
In this paper, we present a hybrid speech recognizer combining Hidden Markov Models (HMMs) and a polynomial classifier. In our approach the emission probabilities are not modeled as a mixture of Gaussians but are calculated by the polynomial classifier. However, we do not apply the classifier directly to the feature vector but we make use of the density values of Gaussians clustering the featur...
متن کاملAdvanced Acoustic Modeling with the Hybrid HMM/BN Framework
Most of the current state-of-the-art speech recognition systems are based on HMMs which usually use mixture of Gaussian functions as state probability distribution model. It is a common practice to use EM algorithm for Gaussian mixture parameter learning. In this case, the learning is done in a ”blind”, data-driven way without taking into account how the speech signal has been produced and whic...
متن کاملSpeech modeling using variational Bayesian mixture of Gaussians
The topic of this paper is speech modeling using the Variational Bayesian Mixture of Gaussians algorithm proposed by Hagai Attias (2000). Several mixtures of Gaussians were trained for representing cepstrum vectors computed from the TIMIT database. The VB-MOG algorithm was compared to the standard EM algorithm. VB-MOG was clearly better, its convergence was faster, there was no tendency to over...
متن کاملA new look at HMM parameter tying for large vocabulary speech recognition
Most current state-of-the-art large-vocabulary continuous speech recognition (LVCSR) systems are based on state-clustered hidden Markov models (HMMs). Typical systems use thousands of state clusters, each represented by a Gaussian mixture model with a few tens of Gaussians. In this paper, we show that models with far more parameter tying, like phonetically tied mixture (PTM) models, give better...
متن کاملThe bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians
Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999